Measuring Over-Generalization in the Minimal Multiple Generalizations of Biosequences
نویسندگان
چکیده
We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages [1] which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns [4].
منابع مشابه
Knowledge Discovery in Biosequences Using Sort Regular Patterns
This paper considers knowledge discovery by sort regular patterns, which are strings over sort letters representing nite sets of basic letters. We devise a learning algorithm for the class based on the minimal multiple generalization technique, and evaluate the method by experiments on biosequences from GenBank database. The experiments show that relatively a simple sort pattern can represent a...
متن کاملFinding Minimal Multiple Generalization over Regular Patterns with Alphabet Indexing
We propose a learning algorithm that discovers a motif represented by patterns and an alphabet indexing from biosequences. From only positive examples with the help of an alphabet indexing, the algorithm nds k regular patterns as a k-minimal multiple generalization (k-mmg for short). The computational results for transmembrane domains indicate that the combination of k-mmg and alphabet indexing...
متن کاملSome Generalizations of Locally Closed Sets
Arenas et al. [1] introduced the notion of lambda-closed sets as a generalization of locally closed sets. In this paper, we introduce the notions of lambda-locally closed sets, Lambda_lambda-closed sets and lambda_g-closed sets and obtain some decompositions of closed sets and continuity in topological spaces.
متن کاملA new characterization for Meir-Keeler condensing operators and its applications
Darbo's fixed point theorem and its generalizations play a crucial role in the existence of solutions in integral equations. Meir-Keeler condensing operators is a generalization of Darbo's fixed point theorem and most of other generalizations are a special case of this result. In recent years, some authors applied these generalizations to solve several special integral equations and some of the...
متن کاملSOME GENERALIZATIONS OF WEAK CONVERGENCE RESULTS ON MULTIPLE CHANNEL QUEUES IN HEAVY TRAFFIC.
This paper extends certain results of Iglehart and Whitt on multiple channel queues to the case where the inter-arrival times and service times are not necessarily identically distributed. It is shown that the weak convergence results in this case are exactly the same as those obtained by Iglehart and Whitt
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005